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Bounded  Concurrent  Time- Stamp  Systems 
Are  Constructive 


Danny  Dolev*  Nir  Shavit* 


Abstract 

Concurrent  time  stamping  is  at  the  heart  of  solu¬ 
tions  to  some  of  the  most  fundamental  problems 
in  distributed  computing.  Bnsed  on  concurrent- 
time-stamp-systems,  elegant  and  simple  solu¬ 
tions  to  core  problems  such  as  /c/s-mutual- 
exclusion,  construction  of  a  multi-reader-multi¬ 
writer  atomic  register,  probabilistic  consensus,... 
were  developed.  Unfortunately,  the  only  known 
implementation  of  a  concurrent  time  6tamp  sys¬ 
tem  has  been  theoretically  unsatisfying,  since  it 
requires  unbounded  size  time-stamps,  in  other 
words,  unbounded  memory.  Not  knowing  if 
bounded  concurrent-time-stamp-systems  are  at 
all  constructive,  researchers  were  led  to  con¬ 
structing  complicated  problem-specific  solutions 
to  replace  the  simple  unbounded  ones.  In  this 
work,  for  the  first  time,  a  bounded  implemen¬ 
tation  of  a  concurrent-time-stamp-system  is  pre¬ 
sented.  It  provides  a  modular  unbounded-to- 
bounded  transformation  of  the  simple  unbounded 
solutions  to  problems  such  as  above.  It  al¬ 
lows  solutions  to  two  formerly  open  problems, 
the  bounded-probabilistic-consensus  problem  of 
Abrahamson  [A88]  and  the  jt/o-f-exdusion  prob- 
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lem  of  [FLBB85],  and  a  more  efficient  construc¬ 
tion  of  mrmw  atomic  registers. 

1  Introduction 

The  paradigm  of  concurrent  time  stamping  is  at 
the  heart  of  solutions  to  some  of  the  most  fun¬ 
damental  problems  in  coordination  of  concurrent 
processes  [A88,  C1L87,  D65,  DGS88,  H88,  L74, 
UB87,  VA86J. 

A  time  stamp  system  of  n  asynchronous  pro¬ 
cesses  is  traditionally  conceived  as  consisting  of 
n  label  registers,  one  per  process,  written  by  it 
and  read  by  all  others.  The  labels  are  unbounded 
natural-numbers,  where  each  process  can  execute 
infinitely  many  labeling  and  scan  operations  on 
the  label  registers.  A  labeling  operation  is  a  se¬ 
quence  of  reads  of  other  labels,  followed  by  a 
write  of  a  label  greater  than  the  maximal  value 
read.  The  label  values  written,  establish  a  total 
order  on  all  labeling  operations  ever  executed. 
A  scan  operation  is  a  sequence  of  reads  of  all 
process’  labels,  returning  a  subset  of  labels  or¬ 
dered  consistently  with  this  total  ordering  A 
concurrent-Ume-slamp~sysiem  ( clss )  is  a  time¬ 
stamp-system  in  which  any  number  of  labeling 
or  scan  operations  (by  different  processes)  may 
overlap  in  time.  A  major  requirement  is  that 
labeling  and  scan  operations  of  any  process  be 
waitfree,  that  is,  completed  in  finite  time  inde¬ 
pendently  of  the  pace  of  other  processes. 

Concurrent  time  stamping  is  the  basis  for 
simple  solutions  to  a  wide  variety  of  the  basic 
problems  in  concurrency  control.  Examples  of 
such  problems  include  /c/s-mutua!-exciusion.  con¬ 
struction  of  a  nnilti-reader-inulti-writcr  atomic 


register,  probabilistic  consensus,...  Unfortu¬ 
nately,  the  only  known  implementation  of  the 
above  paradigm  is  based  on  labels  of  unbounded 
size.  This  is  a  major  drawback,  since  bounded 
memory  size  is  a  key  requirement  of  the  prob¬ 
lems  at  hand,  implying  these  elegant  and  sim¬ 
ple  unbounded  solutions  have  little  theoreti¬ 
cal  value.  Since  it  was  unknown  whether 
bounded  concurrent-time-stamp-systems  are  con¬ 
structive,  researchers  were  led  to  de vising  com¬ 
plicated  problem-specific  solutions  to  show  that 
the  above  problems  are  solvable  in  a  bounded 
way  [B!®7,  BP87  CIL87,  D65,  DGSSS,  PLBB7C, 
FLBB85,  K78,  L74,  L86d,  LH88,  LV88,  R86,  P81, 
P83,  PB87,  VA86], 

Israeli  and  Li  in  [IL87]  were  the  first  to  isolate 
the  notion  of  bounded-t  ime-stamping  as  an  inde¬ 
pendent  concept,  developing  an  elegant  theory  of 
bounded  sejaenfiaZ-time-stamp-systems,  that  is, 
time-stamp  systems  in  a  world  where  no  two  op¬ 
erations  are  ever  concurrent.  They  also  devised 
a  concurrent  labeling  scheme  in  which  the  labels 
provide  a  causality  preserving  relation  However, 
this  relation  is  not  a  total  ordering  since  unre¬ 
lated  labels  and  cycles  arc  possible.  Moreover, 
this  scheme  deals  only  with  labeling,  and  does 
not  address  t  he  central  problem  of  how  labels  can 
be  scanned  concurrently,  therefore  lacking  some 
of  the  key  properties  of  concurrent-time-stamp- 
systems 

In  this  paper,  for  the  first  time,  a  bounded 
construction  of  a  concurrent-time-stamp-system 
is  presented.  It  allows  a  modular  transforma¬ 
tion  of  the  simple  unbounded  solutions  to  such 
core  problems  as  above1.  It  provides  a  powerful 
tool,  enabling  the  design  of  simple  unbounded 
concurrent-time-stamp  based  algorithms,  with 
the  knowledge  that  such  unbounded  solutions  im¬ 
mediately  imply  the  bounded  ones2.  This  is  ex¬ 
emplified  by  providing  the  basis  to  solutions  of 
the  above  flavor  [ADMS88,  ADS89]  to  two  for¬ 
merly  open  problems,  the  bounded-probabilistic- 
consensus  problem  of  [A88]  (requiring  to  solve 
the  probablistic-consenaus  problem  of  [C1L87] 
without  using  an  atomic  coin  flip  operation) 
and  the  fifo-t-tx elusion  problem  of  (FLBB79). 

1  See  Appendix  A. 

3 Bounded  time-stamp  algorithms  for  a  message  part¬ 
ing  environment  without  fault*  are  very  similar  to  that 
described  in  thi*  paper  Lack  of  ipw  e  prevent*  ua  from 
describing  it. 


The  only  known  solutions  to  the  latter  problem 
(DGS88,  P88],  achieve  weaker  forms  of  fairness 
than  the  original  (eat  and  set  based  solution  of 
(FLBB79). 

Though  one  might  think  that  the  price  of  intro¬ 
ducing  such  a  powerful  modular  transformation 
would  be  a  blowup  in  memory  size  or  number 
of  operations,  this  is  hardly  the  case  The  con¬ 
struction  pu'sentefl  in  the  paper  requires  n  regis¬ 
ters  of  0(n )  bits  each,  meeting  the  lower  bound 
of  [IL87]  for  sequential-time-stamp-system  con¬ 
struction.  Though  because  of  Sack  of  space,  a 
complete  comparison  table  cannot  be  provided 
in  this  paper,  one  example  of  the  efficiency  of 
the  c(ss  solutions  is  given  by  the  famous  prob¬ 
lem  of  multi-reader-multi-writer  atomic  register 
construction.  A  simple  solution  based  on  trans¬ 
forming  the  unbounded  [VA861  protocol  (See 
Appendix  A  for  a  description),  has  the  same 
space  complexity  of  the  only  proven  algorithm 
[PB87,  S88],  yet  a  better  time  complexity,  0(n) 
memory  accesses  for  a  w  rite,  0(n  log  n)  for  a 
read,  as  compared  with  0(n2)  for  either  in  the 
former.  Concurrent  time  stamp  systems  are  in¬ 
formally  defined  in  Section  2,  and  implemented 
in  Section  3.  Rigorous  formal  definitions  and  cor¬ 
rectness  proofs  based  on  the  formalism  of  Lam¬ 
port  [L86a,  L8fic]  will  be  presented  in  the  full 
paper. 

2  Concurrent  Time  Stamping 

To  provide  the  reader  with  a  better  intuition  for 
the  more  abstract  formal  definitions  presented 
later,  the  properties  of  a  concurrent-time-stamp- 
system  are  first  outlined  informally  via  the  exam¬ 
ple  of  its  unbounded  natural- number  based  imple¬ 
mentation. 

Informally,  the  natural-number  based  ctss  con¬ 
sists  of  fi  registers  of  unbounded  size,  each  writ¬ 
ten  by  one  of  n  asynchronous  processes  and  read 
by  all  others.  The  labels  are  natural  numbers 
with  the  usual  ordering  among  them3  Each  pro¬ 
cess  can  execute  infinitely  many  labeling  or  scan 
operations,  any  number  of  them  concurrently 
with  the  operations  of  other  processes.  The  scan 

3Proces»  id'»  are  added  le.drograp.hically  to  break  lym- 
metrv,  a  well  known  tcchni»|Ur  which  will  be  referred  to 
in  the  *equel. 


assumed. 


is  the  operation  of  collecting  a  set  of  labels  l , 
one  of  each  process,  by  executing  a  sequence  of 
reads  of  the  labels  in  an  arbitrary  order.  The  la¬ 
beling  operation  is  simply  a  collecting  of  all  the 
labels  followed  by  a  write  of  max(t)  +  1.  The  la¬ 
bels  written  during  labeling  operations  are  mono- 
tonically  increasing,  and,  though  some  were  pos¬ 
sibly  created  concurrently  with  others,  define  a 
total  order  on  all  labeling  operations  ever  per¬ 
formed.  Since  for  any  two  labeling  operations 
that  are  non-concurrent,  the  order  among  the  la¬ 
bels  reflects  the  order  among  the  operations,  this 
order  defines  the  manner  in  which  all  labeling  op¬ 
erations  could  be  serialized.  Though  no  process 
ever  knows  all  of  this  order,  the  order  among  the 
subset  of  labels  returned  by  any  scan  is  in  fact 
the  same  as  the  totaJ  ordering  on  all  the  label¬ 
ing  operatic  .  .4,  no  matter  how  many  labeling 
operations  occurred  while  the  labels  were  being 
scanned! 

A  Concurrent  Time  Stamp  System  is  an  abstract 
data  type  shared  among  n  concurrent  and  com¬ 
pletely  asynchronous  processes.  There  are  two 
Uiattfree  (see  (H88,  AG88])  operations  that  any 
process  can  execute  on  the  ctss,  a  labeling  oper¬ 
ation  and  a  scan  operation  Assume  that  each 
process’  program  consists  of  these  two  opera¬ 
tions,  whose  execution  generates  a  sequence  of 
elementary  operation  executions,  totally  ordered 
by  the  precedes  relation  (of  [L86a,  L86c],  denoted 
“  — “•  ”),  and  were  any  number  of  scan  operation 
executions  are  allowed  between  any  two  labeling 
operation  executions.  The  following 


is  an  example  of  such  a  sequence  by  process  i, 
where  L,  denotes  process  i’s  k,h  execution  of 
a  labeling  operation,  and  5,-  the  k ,l>  execution 
of  a  scan  operation  (the  superscript  [it]  is  used 
for  notation,  and  is  not  visible  to  the  processes). 
A  global  time  model :  of  operation  executions  is 

‘Thift  property  i«  simple  to  achieve  using  unbounded 
labels,  since  the  ordering  among  the  labeling  operations  is 
just  the  ordering  among  the  labels.  The  fact  that  such  a 
property  is  achievable  using  bounded  size  labels  is  some¬ 
what  baffling,  since  as  the  example  in  Section  3  shows,  the 
order  among  the  labeling  operations  cannot  be  the  order 
among  the  labels. 

' Implying  that  for  any  two  operations,  a  ■  6  or 

6  — *■*-  a  (for  more  details  see  [L86c.  B88)  )• 


With  each  labeling  operation  execution  L[k\  a 
label  ff1  is  associated.  A  scan  operation  re¬ 
turns  a  pair  (£,  -<),  where  the  label  view  i  = 
{/*'* . . .  is  an  ordered  set  of  labels”  (one 

per  process),  and  -<  is  an  trreflezive  to>al  order 
among  them,  such  that: 

PI  ordering  There  exists  an  trreflezive  total  or¬ 
der  on  the  set  of  all  labeling  operations, 
such  that: 


a.  precedence:  For  any  pair  of  labeling  op¬ 

eration  executions  and  (where 

possibly  p  =  q),  if  L ^  then 

r(«I  rl») 

Lp  ==»  . 

b.  consistency.  For  any  scan  operation  exe¬ 

cution  S,  returning  (£,-<),  -<  4*' 

if  and  only  if  Lpa '  => 


The  above  property  formalizes  the  idea  that  a 
ctss  can  be  envisioned  as  a  black  box,  inside 
which  hides  a  mechanism  (a  logical  clock)  asso¬ 
ciating  causally  ordered  time  stamps  -  from  an 
infinite  totally  ordered  range  -  with  each  of  the 
labeling  operations,  and  where  scanning  is  like 
peeping  into  this  black  box,  each  scan  returning 
a  view  of  a  part  of  this  hidden  ordering.  The 
black  box  metaphor  is  used  to  stress  that  it  suf¬ 
fices  to  know  of  the  existence  of  such  a  total  or¬ 
dering  =>,  while  the  ordering  itself  need  not  be 
known. 

One  should  bear  in  mind  that  the  asynchronous 
nature  of  the  operations  allows  situations  where 
a  scan  overlaps  many  consecutive  labeling  oper¬ 
ations  of  other  processes.  Also,  several  consecu¬ 
tive  scans  could  possibly  be  overlapped  by  a  sin¬ 
gle  labeling  operation.  It  is  therefore  important 
that  a  requirement  be  made  that  the  label  view 
t  returned  by  sjk^  be  a  meaningful  one,  namely, 
reflecting  the  ordering  among  labeling  events  im¬ 
mediately  before  or  concurrent  with  the  scan,  and 
not  just  any  possible  set  of  labels.  This  will 


4  For  the  purpose*  of  many  of  the  applications  (such  as 
atomic  register  construction),  one  should  allow  the  labi.1 
to  include  an  associated  value  field,  denoted  t  a/«e'ttr[*f  1  ^  ^ 

For  the  sake  of  simplicity,  discussion  of  how  this  added  ~  — — 
feature  is  implemented  will  be  differed  to  the  appendix. 
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eliminate  uninteresting  trivial  solutions  and  in¬ 
troduce  a  measure  of  liveness  into  the  system. 
This  requirement  is  formalised  in  the  following 
definition,  where  *■  is  the  can  affect  relation 
of  [L86a,L86c]. 


P2  regularity.  For  any  label  in  f  of  S, 


lJa'  — ►  s/*1,  and  there  is  no  such  that 


(-1 


t»l 


S. 


M 


Though  such  a  regular  concurrent  time  stamp 
system  (Pl-PS)  would  suffice  for  some  appli¬ 
cations  (as  in  Lamport's  “Bakery  Algorithm" 
(L74j),  a  more  powerful  monotonic  concurrent 
time  stamp  system  will  be  needed  in  applications 
such  as  the  Multi-Reader-Multi- Writer  Atomic 
Register  construction  (as  in  [VA86])  To  this  end 
the  following  third  property  is  added: 


P3  monotonictty:  For  any  label  in  (  of  Sj*' 


*  - r  t 

there  does  not  exist  an  sjk  ^  with  a  label  t j,b ' 


in  its  label  view  £\  such  that  5, 


-U1 


and  db] 


L (possibly  I  =  j). 


It  is  important  to  note  that  PS  does  not  imply 
that  labeling  and  scan  operations  of  all  processes 
are  serializable.  It  does  however  imply  the  se- 
riaiizability  of  the  scans  of  all  processes  and  la¬ 
beling  of  any  one  process.  The  scans  “behave" 
as  if  the  labels  of  any  process  are  monotonically 
increasing,  in  the  sense  that  a  scan  returns  a  la¬ 
bel  of  a  labeling  operation  that  is  at  least  as  late 
as  that  of  any  labeling  operation  of  a  label  re 
turned  in  the  scans  preceding  it.  In  the  follow¬ 
ing  section,  a  bounded  implementation  of  a  con¬ 
current  time  stamp  system  from  atomic  registers 
is  presented  and  informally  justified  Rigorous 
definitions*  and  correctness  proofs  will  appear  in 
the  full  version. 


3  The  Implementation 


The  description  of  the  implementation  is  divided 
into  two  parts,  the  implementation  of  the  labeling 
operation,  and  the  implementation  of  the  scan. 


?The  above  definition*  dc  not  iinlude,  for  example, 
initialization  condition*  cj  the  tyatem 


/,  =  1  /.  =  2  l,  =  1 

L,  Lj  :  I-  *  I  I-"-— —1  J 

«•(»  [ 

Figure  1.  Scan  Ccncuirent  with  Sequential  La¬ 
belings 

The  key  property  of  the  labeling  operation  is  to 
allow  establishing  (he  causality-preserving  total 
order  ==>  among  a)1  labeling  operation  execu¬ 
tions.  Though  it  is  not  required  that  a  process 
“knows”  what  this  order  is,  it  is  required  that 
the  set  of  labels  that  it  “chooses”  during  a  sys¬ 
tem  execution  is  such,  that  an  almighty  outside 
observer,  given  a  description  of  the  execution  and 
based  on  the  labels,  would  be  able  to  reconstruct 
=0-.  This  almighty  observer  could  thus  view  a;l 
labeling  operation  execution  intervals  as  if  they 
were  shrunk  to  points,  that  is,  as  if  they  were 
completely  sequential 

Requiring  this  property  alone,  will  however  not 
be  sufficient.  As  Ei ample  3. 1  shows,  even  if  all  la¬ 
beling  operations  are  sequential,  since  labeb  are 
from  a  bounded  range  (and  therefore  the  same 
labels  are  reused).  ?  process  scanning  the  labels 
concurrently  with  ongoing  labeling  operations, 
cannot  deduce  the  order  =>  from  the  order  of 
the  labels  alone. 

Example  3.1.  In  Pigrre  1,  segments  represent 
operation  execution  intervals,  where  time  runs 
from  left  to  right.  Two  processes  i  and  j  per- 
foim  labeling  operations  sequentially,  j  followed 
by  »,  followed  by  many  labelings,  till  eventually 
the  labels  are  reused,  and  j  for  example  uses  the 
same  label  as  before  A  third  process  z  performs 
a  scan  concurrently  with  the  labelings,  reading  f, 
and  then  (}  6’]  and  S2  represent  possible  execu¬ 
tions  of  this  same  scan,  ihe  only  difference  being 
that  many  labeling  operanons  of  other  processes 
occurred  between  the  rea  ls  in  52.  In  both  the 


Si  : 
S-l  : 


r(«) 


case  that  the  scan  is  of  the  form  Si  and  the  case 
that  it  is  of  the  form  S 2,  the  values  collected  are 
£i  —  ‘2  and  (j  =  1,  where  the  order  among  the 
labels  is,  say  1  <  2.  However,  in  the  case  of  Si, 
j’a  labeling  preceded  «’s,  while  in  S2,  «’s  label¬ 
ing  preceded  j' a.  Thus,  the  order  of  the  labels 
is  not  the  order  among  the  labeling  operations, 
introducing  an  unresolvable  ambiguity. 

Faced  with  the  above  ambiguity,  it  is  clear  that  in 
order  to  design  a  scan  operation,  the  properties 
of  labeling  operation  implementation  should  be 
such,  that  even  though  the  order  =>  between  any 
pair  Lt^  and  is  not  conveyed  by  the  order 
of  their  associated  labels,  the  labels  do  provide 
enough  information  to  allow  an  implementation 
of  a  scan  operation.  The  new  implementation 
will  not  require  that  by  reading  a  pair  of  labels 
of  processes  t  and  j,  one  will  be  able  to  establish 
the  order  among  their  associated  labeling  opera¬ 
tion  executions.  Instead,  it  will  be  required  that 
by  reading  the  labels  of  i  and  j  more  than  once 
(yet  only  a  constant  number  of  times),  one  will  be 
able  to  choose  from  all  the  labels  read,  a  label  of 
i  and  a  label  of  j,  for  which  the  order  =>  among 
the  labeling  operation  executions  can  in  fact  be 
deduced.  In  the  following  sections,  after  present¬ 
ing  these  additional  properties,  a  scan  operation 
implementation  that  utilizes  them  will  be  shown. 

The  basic  communication  primitive  used  in  the 
presented  implementations  is  a  single-writer- 
multi-reader  atomic  register.  Constructions  of 
such  registers  from  weaker  primitives  have  been 
shown  in  (L86a,  L86b,  BP87,  IL87,  N87],  The 
concurrent-hme-stamp-systim  will  consist  of  n 
swmr  atomic  registers  v,,  «  €  {l..n},  each  e, 
written  by  process  «,  read  by  all,  and  having  val¬ 
ues  in  some  range  V'.  In  the  unbounded  natural 
number  implementation  of  a  ctss,  V  is  just  the 
unbounded  set  of  natural  numbers,  and  ^  for 
any  labeling  is  the  usual  irreflexive  total  ordering 
among  them  In  the  following  subsections,  the 
set  of  possible  label  values  V ,  together  with  an 
irreflexive  and  antisymmetric  relation  %  among 
them,  are  defined  in  terms  of  a  precedence  graph 8 
(V',  ^  ).  Each  possible  label  value  is  a  node  in 
this  graph.  The  order  among  the  labels  in  any 
two  registers  is  the  order  \  established  by  the 
edges  of  the  precedence  graph  Based  on  the 

(IL87]  for  lower  bounds  on  ihc  size  of  such  graphs. 


precedence  graph,  an  implementation  of  the  la¬ 
beling  and  scan  operations  will  then  be  provided. 
Unlike  in  the  unbounded  natural  number  imple¬ 
mentation,  and  following  the  above  discussion, 
the  returned  ordering  -<  among  labeling  opera¬ 
tions  is  not  the  same  as  the  ordering  \  . 

3.1  The  Labels  and  the  Precedence  Rela¬ 
tion 

The  following  is  the  description  of  the  precedence 
graph  T* ■  Though  the  precedence  graph  (of  un¬ 
bounded  size)  defined  by  the  natural  numbers  is 
acyclic,  this  will  not  be  true  for  T" . 

Define  A  dominates  B  in  G,  where  A  and  B 
are  two  subgraphs  of  a  graph  G  (possibly  single 
nodes),  to  mean  that  all  nodes  of  A  have  edges  di¬ 
rected  to  all  nodes  of  B.  Define  the  following  gen¬ 
eralization  of  the  composition  operator  of  [I  L87] . 
The  a -composition,  G  oQ  //,  of  two  graphs  G  and 
if,  where  a  is  a  subset  of  the  nodes  of  G.  is  the 
following  non-commutative  operation: 

Replace  every  node  v  £  ct  of  G  by  a 
copy  of  H  (denoted  //„)  and  let  Hv  (or 
v)  dominate  Hu  in  GoaH  if  v  dominates 
u  in  G. 

Define  the  graph  T2  to  be  the  following  graph  of 
5  nodes:  a  cycle  of  three  nodes  {3,4,5}  (where  3 
dominates  5,  which  dominates  4,  which  in  turn 
dominates  3),  all  dominating  the  nodes  {2,1}. 
where  node  2  in  turn,  dominates  node  1. 

Define  the  graph  Tk  (a  complete  tournament)  in¬ 
ductively  to  be: 

1.  T)  is  a  single  node 

2.  Tk  =  T2  oa  Tk~x ,  where  a  =  {5,  4,  3,1}  and 
k  >  1. 

The  graph  Tn  =  (V,  v-<  )  is  the  precedence 
graph  to  be  used  in  the  implementation  of  the  la¬ 
beling  and  scan  algorithms  of  a  concurrent  time- 
stamp  system  for  n  processes.  For  any  process 
i,  each  node  in  T”  corresponds  to  a  uniquely  de¬ 
fined  label  value  The  label  can  be  viewed  as 
a  string  f|[n..l)  of  n  digits,  where  each  £,  [i]  € 
{1  . .  .5}  is  the  digit  of  the  corresponding  node  in 


r3: 


Figure  2:  The  Recursive  Graph  Structure  for  T2 
and  T 3 

T*,  replaced  by  a  Tk  subgraph  during  the  k,h 
step  of  the  inductive  construction  above,  'lhe 
digit  f,[n]  is  always  1,  representing  the  complete 
Tn  graph,  and  if  in  £[*■]  =  2.  then  (,[j)  =  1 
for  all  j  €  {k  —  1 . .  1 }  (since  node  2  is  never  ex¬ 
panded  in  the  induction  step)  Therefore,  given 
any  label  the  Tk  subgraph  of  T"  in  which  its 
corresponding  node  is  located  is  identified  by  the 
corresponding  prefix  f,[n..fc]. 

To  assure  that  based  on  the  graph  Tn  a  total 
ordering  among  the  label  values  returned  by  a 
scan  can  be  established,  one  needs  to  break  sym¬ 
metry  among  processes  having  the  same  label.  As 
usual,  process-ids  art  used  Thus,  the  label  f,  is 
assumed  to  be  concatenated  with  the  id  of  pro¬ 
cess  i.  The  label  and  id  are  lexicographically  or¬ 
dered  This,  in  terms  of  the  graph  Tn ,  amounts 
to  no  more  than  assuming  that  each  T1  graph 
consists  of  a  total  order  tournament  of  n  nodes, 
each  process  i  always  choosing  the  ith  node  in 
the  order.  For  the  sake  of  simplicity  this  point  is 
not  elaborated  on  in  the  sequel. 

3-2  The  Labeling  Operation 

Let  the  collect  operation  by  any  process  i  he  a 
reading  of  all  the  registers  t;,  j  £  {In},  once 
each,  in  an  arbitrary  order  returning  a  label  set  ( 
(not  to  be  confused  with  (,  the  output  label  view 
of  a  scan  operation).  I  lie  labeling  operation  of  a 
process  i  is  of  the  form  described  below,  where 


C  :  l  "  x  {!  n)  . — -I  is  a  labeling  function ,  re¬ 
turning  a  label  value  (,  "greater  than”  all  other 
label  values9.  This  is  a  form  similar  to  the  natu- 
ral  number  ctss ,  where  the  labeling  function  C  is 
just  max(t)  -f  1.  However,  the  interpretation  of 
“greater  than”  is  not  as  straightforward  as  in  the 
natural  number  case. 

procedure  labeling. 
begin 

£  :=  collect ; 
f.  :=  £(t,i) 
end, 

The  definition  of  the  labeling  function  £((.  i)  pre¬ 
sented  below,  is  based  on  a  recursively  defined 
function  Ck (G ,  ( ,  ( x ) ,  which ,  given  a  Tk  subgraph 
G.  of  T”,  a  set  of  labels  £,  and  a  “maximal”  la¬ 
bel  4  €  £  in  Tk ,  returns  the  label  of  a  node  in 
G  that  is,  as  termed  above,  “greater  than”  the 
other  labeis.  For  the  sake  of  simplicity,  and  since 
the  collected  set  of  labels  f  remains  unchanged 
in  £(£,i)  once  it  is  collected  (similarly  the  vari¬ 
able  4,  once  it  is  computed),  it  is  treated  as  a 
global  variable  and  is  not  passed  as  a  parameter 
in  all  the  utility  functions  used  by  C((,  i).  The 
following  functions  are  used  in  defining  £: 

num.labels(G)  -  a  function  that,  for  the  given 
label  set  (,  returns  how  many  of  the  labels  are  in 
sub-gtaph  G; 

dom(x)  -  a  function  that,  lor  a  given  digit 
J  €  {15)  representing  a  node  in  the  graph 
T* ,  returns  the  next  dominating  node,  namely. 
dom(\)  -  2,  dow( 2)  =  3,  dom( 3)  =  4,  dom( 4)  = 
5  and  dom(b)  =  3; 

domiTuiting_set((,  (()  a  function  that,  for  a  set 
of  labels  (  C  f .  and  a  label  rj  €  (.  returns  a  subset 
of  labels  {4  €  1 1  (,  %  (} }  U  { (, } ;  and 

max(i)  -  a  function  that,  for  a  set  uf  labels  f  C.  (, 
returns  a  label 

(4  E  (  :  jdonuna/iri5_*<i(fi4)|  < 

\dorninating.set((.  i}  )| .  t  f), 

the  maximal  label,  1  e  ,  the  one  least  dominated 
within  this  set 

M  Initially.  all  labels  are  on  n,.de  1111),  th-  nod-  doin- 
mated  by  all  olhen*  in  Tn 


Denote  the  concatenation  operation,  where  G  is 
a  string  and  z  is  a  digit,  by  (7.x.  The  following  is 
thus  the  definition  of  the  labeling  function  C(£,  i). 
The  subgraphs  G  are  identified  with  the  relative 
prefixes,  where  T”  is  identified  with  the  label  1: 

function  C  (f,  i); 
function  Ck(G)\ 
begin 

1 :  if  k=  1  then  return  G, 

2:  iff«[n..J \)±G 

then  return  £t_l(C.l); 

3:  if  tt[n..k—  l]  =  <7.2 

then  return  Ck~l(G .3); 

4:  if  it  >  2  then 

if£x[k-2]  G  {2, 3, 4, 5}  and 
(£,{n..k-\)^fz[n..k-l)) 
then  return  £i_1(G  dom(iT[k  —  1])); 

5:  if  (num.labels(£z[n..k—l})  <  k—  1)  or 

(( num.labels(£z[n  1])  =  it—  1)  and 

(Mn.i-l]=Mn-''l])) 

then  return  £l_1((7.^T[t— 1]) 
else  return  Ck~l(G  dom(£z[k - 1])); 
end  £*; 
begin 

Cz  :=  max(dominating^set(£,  £,))\ 
return  Cn(Tn), 
end  £; 

For  the  purpose  of  giving  the  reader  some  intu¬ 
ition  about  the  properties  of  the  labeling  opera¬ 
tion,  let  it  be  assumed  that  one  can  talk  about 
the  values  of  the  labels  of  all  processes  at  “points 
in  tin.e”.  Though  the  goal  in  the  remainder  of 
this  section  is  to  show  how  the  labeling  operation 
executions  allow  to  define  the  order  =>,  it  will 
first  be  shown  that  they  meet  a  much  simpler  re¬ 
quirement.  The  requirement  is  that  at  any  point 
in  time,  the  subgraph  of  the  precedence  graph  T n 
induced  by  the  labeled  nodes  (those  whose  corre¬ 
sponding  label  is  written  in  some  v,- ) ,  contains  no 
cycle.  Since  T"  is  a  complete  tournament,  this 
implies  that  at  any  point  in  time,  all  labels  are 
totally  ordered. 

The  labeling  operation  executions  maintain 
two  “invariants,”  namely,  that  at  any  point  in 
time  (1)  there  are  labels  on  at  most  two  of  the 
three  nodes  in  any  cycle  of  any  subgraph  Tk  (the 
cycle  consists  of  “supernodes”  {3,4,5},  called  su- 
pernodes  since  they  are  actually  Tk~l  subgraphs), 


and  (2)  there  are  no  more  than  k  labels  in  the  cy¬ 
cle  of  any  subgraph  Tk  Maintaining  the  second 
invariant  is  the  key  to  maintaining  the  first,  and 
the  first  implies  that  at  any  point  in  time,  there 
are  never  any  cycles  among  labels. 

The  manner  by  which  the  invariance  of  (1)  and 
(2)  is  preserved,  is  explained  via  several  exam¬ 
ples,  In  these  examples,  T3  is  a  precedence  graph 
for  a  system  of  three  processes  x,  y  and  z.  All  ex¬ 
amples  start  at  a  point  in  time  where  =  134, 
=  135,  and  =  141,  that  is,  all  labels  are 
totally  ordered  by  ^  . 

Example  3.2.  Assume  that  the  following  se¬ 
quence  of  labeling  operation  executions  occur  se¬ 
quentially.  Process  y  performs  Ly6*1',  reading 
fla^,  fy6'  and  ,  and  moving,  based  on  C{(,  y) 
to  fy^  =  142.  Process  z  performs  L]e+^,  read¬ 
ing  the  new  label  and  thus  moving  to  the 

T 2  subgraph  14,  (4H2)  =  144,  Z,]*21  =  145, 
ly*^  =  143  -  ),  maintaining  the  above  invari¬ 
ants,  because  tiie  T 2  graph  is  a  precedence  graph 
for  2  processes.  If  at  some  point  x  moves,  in  Li'I+1‘ 
it  will  read  the  labels  of  both  z  and  y  as  being  in 
the  T 2  subgraph  14.  Since  num_/aAe/s(T4')=2, 
by  line  5  of  x  will  move  to  =  151. 

The  reader  can  convince  himself  that  following 
any  labeling  operation  execution  L\c^  by  some 
process  z,  the  above  invariants  hold,  and  that 
for  the  set  £  of  labels  that  were  read  in  Ljc' s  col¬ 
lect  operation  (denoted  read (L|c^)),  it  is  the  case 
that  (V£yk^  6  rcad(L]c'))(fyi'  ^  £*),  that  is,  the 
new  label  chosen  is  greater  than  all  those  read 

As  seen  in  the  following  example,  in  the  con¬ 
current  case,  more  than  k  labels  may  move  into 
the  same  Tk  structure  at  the  same  time  It  is  thus 
not  immediately  clear  why  the  second  invariant 
holds. 

Example  3.3.  Assume  that  the  following  se¬ 
quence  of  labeling  operation  executions  occur 
concurrently.  Processes  x  and  y  begin  perform¬ 
ing  and  Lyt>+^  concurrently,  reading  £zaK 

and  and  computing  £,  such  that  = 

fy*41^  =  142.  If  they  then  continue  to  complete 
their  operations  by  writing  their  labels,  though 
they  have  the  same  node  as  a  label,  they  were 


concurrent,  and  can  be  ordered  by  relative  id  If 
any  of  them  then  continued  to  perform  a  new  la¬ 
beling  operation,  since  nnm.labeh('14')  >  2,  it 
would  choose  label  151,  not  entering  the  cycle. 
However,  let  11s  suppose  that  they  do  not  both 
complete  writing  their  labels,  that  is,  x  stops 
just  before  writing  to  iq  ,  while  y  writes 

/J44^  =  142  Process  z  then  performs  li^, 
reading  the  new  label  fJ44"1'  and  the  old  label  f]'1', 
thus  moving  to  =  143.  Processes  y  and  z 

continue  to  move  into  and  in  the  cycle  of  the  TJ 
subgraph  14,  since  they  continue  10  read  r  s  old 
label.  Then,  at  some  point  z  completes  lI*4"^, 
and  there  are  three  labels  in  14  (two  of  them  in 
the  cycle).  However,  if  x  now  performs  a  new  la- 
beling  it  will  read  the  labels  of  both  x  and 

y  as  being  in  14.  Since  numjabel>(' 14')  >  2.  by 
line  5  of  C((,  1),  x  will  move  to  =  151,  not 

entering  the  cycle. 

Generalizing  the  above  example,  even  if  many 
processes  move  into  a  Tk  subgraph,  wit  hout  read¬ 
ing  one  smother’s  labels,  at  most  k  of  them  will 
enter  the  cycle  in  Tk .  The  reason  is  the  following 
well  known  flag  principal lu. 

If  k+1  people,  each  first  raise  a  flag,  and 
then  count  the  number  of  raised  flags, 
at  least  one  person  must  see  k  +  1  flags 
raised. 

By  the  definition  of  the  labeling  function  C,  each 
process  moving  into  the  cycle  of  a  Tk  subgraph, 
must  first  move  to  either  supernude  1  ur  2  in  7’*, 
only  then  can  it  perform  a  labeling  into  the  cycle. 
The  move  to  1  or  2  is  the  raising  of  the  flag,  and 
the  move  into  the  cycle  is  the  counting  of  all  flags. 

The  following  example  shows  that  even  though 
by  the  above,  there  are  at  most  Jr  labels  at  a  time 
in  any  Tk  structure,  the  sets  of  labels  read  in  a 
labeling  operation  execution,  may  contain  cycles 

Example  3.4.  Process  c  begins  performing 
reading  -  134  Process  y  then  per 
forms  reading  f|a^.  and  f!  J.  and  mov¬ 

ing  tc  fy441^  =  142.  Process  x  performs  zi*4^, 

^Proof  follows  by  the-  fa«  l  t)u»l  il.i:  last  p**r«on  *c  Mar* 
counting  flags  must  have  seen  k  -f  1  flags  i  ais'-H. 


reading  the  m  w  l«l>.  i  /y4tl‘  and  and  thus  by 
line  5  of  £,  moving  to  fi*  ‘‘  —  151  Process  y 
then  performs  Zy44''.  reading  /'J-3+1'  and  moving 
to  Zy442^  =  152  Finally  process  c  reads  fy4"4^  It 
thus  read  —  134.  / y  —  152.  and  f\  ‘  —  HI, 
three  labels  on  a  cycle 

In  order  to  select  a  label  dominating  all  others,  i 
must  establish  where  the  "maximal  label”  among 
them  is.  To  overcome  the  problem  that  the  labels 
read  form  cycles  (as  in  the  above  example),  the 
labeling  function  £(/.-)  does  not  take  into  ac¬ 
count  “old  values”  such  as  f)3',  it  considers  only 
1  Ip’  's  that  dominate  the  current  label  fjc^. 

maintain  the  first  invariant,  z  should 
4I^  —  131.  to  dominate  the  current 
lr  both  x  and  y  However,  there  is  seeni- 

ii  a.  oblcm.  since  ;  did  not  read  the  label 
fj04  '  =  1 5 1 ,  and  so.  how  can  it  decide  what  label 
to  choose  in  order  to  dominate  1  =  151?  The 
solution  is  dm;  to  the  fn<-t  that  ran  duluce  the 
existence  of  G'341'  =  151,  since  in  all  of  the  cycle 
of  T 3  there  are  3  labels,  and  in  order  to  move  to 
Z’yk4"1'  =  152,  y  must  have  read  some  label  in  node 
151  of  the  7’2  subgraph  15  B>  simple  elimina¬ 
tion  this  must  be  the  label  of  x.  This  simple  rule 
is  maintained  by  application  of  line  4  in  C  How. 
ever,  if  the  above  scenario  occurred  in  the  cycle 
of  a  Tk  graph,  where  k  >  3,  then  in  order  to  al¬ 
low  the  same  reasoning  as  above,  it  must  be  that 
if  x  read  =  152  (or  ty44^  £  {153, 154, 155)), 
it  can  conclude  that  k  -  2  other  labels  were  read 
by  Zy44"2^  in  the  7  t_1  subgraph  15.  It  is  for  this 
purpose  that  supernodi'  1  of  any  '<  grapl'  where 
k  >  2,  is  not  a  single  node,  but  a  1  *~1  subgraph. 
A  process  can  thus  choose  the  node  2,  only  af¬ 
ter  it  established  that  there  were  k  —  1  labels  in 
supernode  1  Since  node  2  is  a  '  bridge,’'  that 
some  process  must  “cross”  (choose)  before  any 
process  can  move  into  the  cycle,  the  above  rea¬ 
soning  holds 

Though  llie  above  invariants  hold,  it  follows 
from  Example  3  -i  that  the  property  that  the  cho¬ 
sen  new  label  is  greater  than  all  those*  read,  true 
for  sequential  labeling  operation  executions,  does 
not  hold  in  the  concurrent  ease  Fortunately, 
there  is  a  similar  property  that  does  hold,  a  prop¬ 
erty  that  will  prove  important  in  the  implemen¬ 
tation  of  the  scan.  Let  the  notation  r,(Lt^)  and 


3.3  The  Seen  Operation 


ti  ( Lp)  denote  the  read  of  and  wri'j  of  t>,  dur¬ 
ing  a  labeling  operation  execution  /.J^  by  a  pro¬ 
cess  i. 

Definition  3.1.  Labeling  is  observed  by 
I>‘]  (denoted  /J”1  i  f  rt(Lli])  =  or 

there  exists  an  z]1^  such  that  r,(Zy  )  =  f]c*  and 

zia)  jo.*.  i\cy 


The  relation  is  actually  the  transitive  clo¬ 

sure  of  the  read  relation.  Let  marimaLo6s(Zia') 
be  the  set  of  operation  executions 

{Llk]  |y€  {I  n},  L[yb]  Llza]  and 
(VZ^M./zJ*1 -  LP  then  LP zi°!)}, 

that  is,  including  the  “latest"  label  observed  for 
each  process.  In  the  concurrent  executions,  in¬ 
stead  of  the  new  label  being  greater  than  al!  the 
labels  read,  it  is  the  case  that 

(VfWg  maximaLobsiL^mt^  *  fM), 

namely,  the  new  label  chosen  is  greater  than  the 
latest  of  those  observed  For  the  labeling  z]'*1' 
of  Example  3  •/,  though  f  read  =  143,  and 
/!"+1'  i<  /j3’,  it  is  the  case  that  its  maximal  ob¬ 
served  label  is  and  2<. 

Finally,  the  following  is  the  irreflexive  total  or¬ 
der  =>  on  the  labeling  operation  executions  as 
required  by  property  Pi. 


Definition  3.2.  Gii'en  any  two  distinct  labeling 

.  .  r  [a]  »  r  (61  r  [a]  _  j  [6] 

operation  executions  Li  ana  Ly  ,  Li  =>  Ly 
ij  either 


1.  Llra]  Lyb] .  or 

2.  L[za]-°j*~  Lyb],  Llb]-°f*~  z]‘>.  and  fi"1 


Intuitively,  since  with  every  there  is  an  asso¬ 
ciated  label  (z  ,  =>  is  a  “lexicographical”  or¬ 
der  on  a  pairs  (Zi^.fj^)  The  first  element 
in  the  pair  is  ordered  by  ,  a  partial  or¬ 

der  that  is  consistant  with  the  ordering  — — *■  (if 
Zj^  — *  Ly4'  then  in  Zy6^,  y  read  (z‘  or  a  later 
label)  The  second  element  is  ordered  by  v<  , 
an  irreflexive  and  antisymmetric  rela'ion.  In  the 
full  paper  it  is  proven,  that  the  “static”  relation 
*  on  the  labels,  completes  the  “dynamic”  par¬ 
tial  order  to  a  total  order  on  all  labeling 

operation  executions. 


The  scan  algorithm  consists  of  two  main  steps, 
performing  a  sequence  of  Sri  log  n  collect  opera¬ 
tions  11 ,  and  analyzing  the  collected  labels  to  se¬ 
lect  a  set  l  for  which  an  order  -<  can  be  returned. 

Let  fcmk.c  €  {1  8),  m  €  {L.flogn}}.  and 
k  €  {1 .  n}  denote  variables,  each  holding  a  set  of 
labels  collected  in  the  c,h  col¬ 

lect  operation  execution  of  the  m<fl  level  of  the 
ktki  phase  Let  half(r)  and  olher.half(r)  be  com¬ 
plementary  functions,  that  for  a  given  set  r,  re¬ 
turn  two  disjoint  subsets  rl  and  r2.  such  that 
r  1  U  r2  =  r  and  —  1  <  |r  1 1  —  |r2|  <  1. 

The  scan  algorithm  returns  the  set  of  labels  f. 
one  of  each  process,  and  the  ordering  -<  among 
them  is  represented  by  the  vector  O  holding  a 
permutation  of  numbers  in  { In},  the  number  in 
the  i'h  position  representing  the  relative  order  of 
the  label  (, 1J. 

function  scan, 

function  select(m,k,r), 
begin 

if  |r|  =  1  then  return  (r  :  x  €  r); 
else 

x  :=  select(m—  1  ,  k ,half(r)}\ 

y  :=  select(m—  1 ,  k.  other.half(r)). 

if  (3cl,c2€  {1.-8}) 

(cl  <  c2)  A  (f‘Xmk  %  m ■*) 

then  return  y 
else  return  x 

fi, 

fi, 

end  select, 
begin 

R  { I  n} , 

0[l..n]  0, 

(  :=  0; 

for  k  :=  1  to  n  do 


11  Note  that  the  can  algorithm  requires  a  scanning  pro¬ 
cess  only  to  read  other  labels,  And  does  not  require  it  to 
write.  This  lark  of  a  need  for  two  way  communication 
between  the  scanner  the  labelers  is  a  properly  found  in 
the  implementation  of  the  natural  number  based  rt$$. 

13  For  the  sake  of  simplicity,  though  the  returned  labels 
in  (  could  contain  various  data  associated  w  ith  the  given 
labeling  operation  (that  is,  data  written  into  the  register 
f,  together  with  the  implementation  label  value),  the  scan 
implementation,  will  return  only  the  implementation  label 
value  tt 


for  m  :=  1  to  [logn]  do 
for  c  :  =  1  to  8  do 
f'mk  :=  collect 
od; 
od; 


od, 
for  k 

s 

( 


—  n  downto  1  do 
=  seleci(  flog  n] ,  k.  H): 

9. flog  . 


:= 

0(s]  :=  k- 

R  —  R  -  {s}; 

od; 

return  (/.0), 
end  scan. 


The  scan  operation,  as  noted  above,  begins 
with  a  sequence  of  Snflogn]  collect  operations, 
for  which  the  returned  labels  are  all  saved  in  a  set 
of  variables  (cmk.c  £  {18},  m  €  {1  flog  n] } , 
and  k  €  {I  n}  The  remainder  of  the  algorithm 
defines  how  to  choose  «  of  these  labels,  one  per 
process,  for  which  -<  (i.e  =^)  can  be  established 
The  following  is  an  outline  of  how  this  selection 
process  is  perfouned 

By  the  order  of  label  collection,  the  labels  read 
in  phase  k  =  1  are  the  earliest  to  have  been  col¬ 
lected,  those  for  k  =  n  the  laat  From  the  8}logn] 
collected  label  sets  of  each  phase,  the  algorithm 
selects  one  label.  The  selected  label  in  the  k,h 
phase  will  be  the  k  largest  in  the  order  -<.  As 
it  turns  out,  to  guarantee  that  this  is  the  case, 
it  suffices  that  the  following  Condition  1  holds 
(slightly  abusing  notation  in  the  definition): 


same  line  of  proof  can  be  extended  inductively  to 

all  k'  <  k. 

By  Condition  /,  /,,*  '■*  L* ^|QS"1  ■*.  Since 
the  read  of  /*•*  1  was  performed  after  that  of 
^s.fiognl>-i,  ej,j)er  t|,.  |a|)e|  0f  the  same  label¬ 
ing  operation  execution  was  read  m  both  cases, 

r  t  g.jlognl.k  t]  . 

or  Ly  H  Lt  .  By  similar  rea- 

j  8,floa  n]  ,k-7  ,  8. flog  n]  .k~l  .  •  .  » 

somng  Lt  6  =>  Ly  *  .  which  by 

transitivity  of  =>,  establislies  /,?  1  log nl  *-2 

^  S.flognl.i 

The  select  function  applied  in  any  phase,  is  a 
recursively  defined  “winner  take  all"  type  selec¬ 
tion  algorithm,  among  all  the  processes  in  R  It 
returns  the  id  of  the  “winner,"  a  process  s  meet¬ 
ing  Condition  1.  At  any  level  m  ol  the  applica¬ 
tion  of  select  select(m  k.r).  the  winners  of  the 
selections  at  level  m  —  1  are  paired  up,  and  from 
each  pair  one  “winner"  process  is  selected,  to  be 
passed  on  to  the  (rn+l)th  level  of  selection.  Af¬ 
ter  at  mast  flog|/f|{  levels.  >,  the  winner  of  all 
selections,  is  returned 

Based  on  the  definition  of  the  select  function, 
maintaining  the  following  Condition  £  suffices  to 
assure  that  the  label  of  ihe  process  s  returned  by 
select(rn,k,r).  meets  Condition  1 

Of  the  two  processes  x  and  y  in  the  ap¬ 
plication  of  select  at  level  m  of  phase 
k,  the  one  returned,  say  x ,  is  such  that 

Ll.m.k  £8.m,*  w(lere  and 

regpert]vely  are  the  labels  asso¬ 
ciated  with  these  iabeling  operation  ex¬ 
ecutions 


For  the  label  p^.fioan]  k  t  collected  in  the 
[logn]th  level  of  the  k,h  phase,  and  any 
label  ( y  1  4  of  a  process  y  £  1C  collected 
in  the  /”  level  of  the  k,h  phase,  it  is  the 
case  that  =>  L,8^"1  * . 

Maintaining  (undi/ion  1  is  sufficient  to  assure 
that  the  label  returned  in  the  k,h  phase  is  the 
k  largest  Let  it  be  shown  that  the  labeling  op¬ 
eration  execution  of  a  label  returned  in  a  phase 
k'  <  k ,  preceded  (in  the  ordering  =>)  that  of 
the  label  returned  in  the  phase  k  The  follow¬ 
ing  shows  that  this  is  the  case  for  the  labels 
^.flognU  ^S.ricgnl.t-l  anJ  retur))pd 

ill  phases  k,  tt-1,  and  k- 2  respectively  The 


Maintaining  Condition  £  suffices  for  the  follow¬ 
ing  reason.  If  at  level  m  process  x  was  se¬ 
lected  between  x  and  y,  and  at  level  rn  -  1  pro¬ 
cess  y  was  selected  between  y  and  ;,  by  the 
same  iine  of  proof  as  above,  from  Ljm'k  => 
L?m  k  and  L}-™-'  *  /  «■"-  >*,  it  follows  that 

^8.m  *  gy  induction  this  implys 

Condition  1 

Kecali  Example  d  1.  implying  that  it  is  impos¬ 
sible  to  establish  the  order  — among  two  label¬ 
ing  operation  executions,  foaii  the  order  among 
their  associated  labels  alone  To  overcome  this 
problem,  instead  of  attempting  to  decide  the  or¬ 
der  between  two  given  labeling  operation  execu¬ 
tions,  the  algorithm  will  choose  a  pair  out  of 


several  given  labeling  operation  executions,  for 
which  the  order  =>  can  be  determined.  Thus, 
to  allow  the  select  operation  at  level  m  of  phase 
k,  to  choose  a  “winner”  process,  say  x,  for  which 
£i,m,t  ^  ]abels  of  x  and  y  from  8  con¬ 

secutive  collects  will  be  analyzed. 

Let  it  first  be  shown  that  if  the  following  Con¬ 
dition  3  holds  for  y,  that  is 

(3d,c2  6  {1..8})(el  <  c2)  A  (4el  m-t  * 

then  =>  t  (this,  because  of  the 

order  of  label  collecting,  will  imply  Ljm'*  => 
Ly'm,i).  Assume  by  way  of  contradiction 
that  =>  Lfmk.  Since 

£y2mk,  it  must  be  by  the  definition  of  => 
that  Ly2,m,k  Jtu-  Lf1,m,k.  It  cannot  be  that 
f.y2mk  €  mazimaLobs(Lglm'k),  since  by  the 
properties  of  the  labeling  scheme,  for  the  label 
4*1  €  maztmal.obs<LZi  r,'  k),  vy  t?-n‘k. 
Thus,  there  must  be  a  different  labeling  op¬ 
eration  execution  G  mazimaLobs(L‘lirn’k), 
Ly7m  k  — *  L This  label  £ybl  was  already  ob¬ 
served  (i.e.  must  have  been  written),  before  the 
end  of  the  read  of  Thus,  or  a  la¬ 

bel  later  than  it,  must  have  been  read  instead  of 
fci.m.i ,  jn  tfje  ronec(  c2  of  level  rr.  in  phase  k,  a 
contradiction 

It  remains  to  be  shown  that  if  Condition  9  does 
not  hold  for  y,  it  is  the  case  that  Ly'm,k  => 
and  x  can  be  correctly  returned.  As¬ 
sume  by  way  of  contradiction  that  Condition  3 
does  not.  hold  for  y.  It  cannot,  by  the  same 
arguments  as  above,  be  that  Condition  9  holds 
for  x,  that  is,  (3cl,c2  G  {1..8})(cl  <  c2)  A 
(£yl’m,k  ^  £'2'm  k).  Therefore,  it  must  be  that 
there  are  four  nonconsecutive  collects  of 
cl  €  {1,3, 5, 7},  and  four  nonconsecutive  collects 
of  c2  G  {2, 4,6,8}  such  tvjt  the  labels 

fyi,m,k ,  cl  G  { 1, 3, 5,  7}  arc  ail  from  one 

another,  and  the  labels  f/2,mV  T, 4,6, 8} 

are  all  different  from  one  anothe  c  reason  is 
that  if  any  two  of  them  are  the  same,  say 
and  fy5  m  *,  then  in  order  for  the  above  Condi¬ 
tion  9  not  to  hold  for  x  c!  =  4  and  c2  —  3,  it 
must  be  that  fr4  m'*  Sc  £ymk  But  since  fjmk 
and  £y  'n,k  are  the  same,  it  would  follow  that 


(4,m.k  v,  and  Condition  3  would  hold  for 

y,  a  contradiction. 

To  complete  the  proof,  it  remains  to  be  shown 
that  if  the  labels  £yi,m,k,  cl  G  {1,3, 3, 7}  are  all 
different  from  one  another,  and  the  labels  f m 
c2  G  {2,4,6, 8}  are  all  different  from  one  another, 
then  Ly'm'k  =>  The  situation  above  is 

such  that  during  the  8  collect  operations,  each 
of  the  processes  x  and  y  executed  a  new  labeling 
operation  at  least  3  times.  It  can  be  formally 
shown13  that  the  third  new  labeling  operation 
execution  after  x  and  y  moved  at  least 

3  times,  occurred  completely  after  the  initial  la¬ 
beling  of  y,  that  is,  Ly'm,k  — ►  L®  m  i. 

Formal  proofs  will  be  presented  in  the  full  pa¬ 
per.  As  a  final  comment,  note  that  for  algorithms 
where  only  the  maximum  label  is  required,  and 
not  a  complete  order  among  all  returned  labels 
(like  in  construction  of  a  mrmw  atomic  register  or 
solutions  to  the  mutual  exclusion  problem),  only 
one  phase  of  label  collection  is  required,  that  is, 
only  81ogn  collects1*. 
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A  Some  Examples  of  Applications 

The  following  is  a  simple  unbounded  algorithm 
for  solving  the  famous  problem  of  constructing  a 
mrmtu  atomic  register  ,  from  swmr  atomic  regis¬ 
ters.  This  solution  is  a  version  (due  to  Li  and  Vi- 
t&ni  [LV88])  of  the  elegant  and  simple  unbounded 
Vitani-Awerbuch  algorithm  [VA 86] .  It  is  based 
on  the  use  of  a  natural  number  ctss.  Each  pro¬ 
cess  i  writes  to  a  mrsu;  atomic  register  denoted  tip 
Each  register  contains  two  fields,  &  label,  lhat  is, 
a  natural  number,  and  a  value  associated  with  it 
(valuedtlabel  using  the  rotation  of  (LV88)).  The 
following  is  an  implementation  of  the  read  and 
uinte  by  a  process  i. 


function  read; 
begin 

read  iq, 

select  the  maximal  timestamp  lr  \ 
return  value@£t; 
end; 

procedure  write(value); 
begin 

read  v,, 

select  the  maximal  timestamp  lt\ 
write  in  to  v,-  the  value  and  lt  4-  1; 
eud; 

Note  that  the  write  operation  is  just  a  labeling, 
and  the  read  is  a  scan  followed  by  returning  the 
value  associated  with  the  maximal  label.  As  men¬ 
tioned  earlier,  one  would  need  to  let  the  labels 
of  the  ctss  include  their  associated  values.  Re¬ 
placing  the  above  unbounded  operations  by  the 
Labeling  and  Scan  operations  of  the  bounded 
concurrent-time-time-stamp  system  will  immedi¬ 
ately  produce  a  bounded  solution  to  the  problem. 
Note  again  that  the  general  implementation  of 
the  scan  operation,  as  described  in  this  extended 
abstract  requires  8n  log  n  collects,  but  since  only 
the  maximum  (and  not  a  total  ordering)  of  the 
labels  is  required,  it  can  be  reduced  to  8  log  n  col- 
lectc,  as  will  be  elaborated  upon  in  the  full  paper. 

The  following  is  a  fifo  solution  to  the  l-Exclusion 
Problem  due  to  [ADMS88],  based  on  the  use  of  a 
ctss.  In  the  following,  the  scan  and  label  opera¬ 
tions  of  process  i  are  as  described,  where  the  ctss 
is  implemented  using  swmr  atomic  registers,  and 
Xj ,  i  £  {  1 , ..,  n}  are  stemr  safe  registers. 

do  forever 
r,  :=  true, 
labeling; 

L:  ( i ,  -<)  :=  scan ; 

if  |{  J  I  xi  A  {£)  -<  fi))|  >  /  theu  goto  L  f’; 
critical  section 
t,  false ; 

rcmntndcr  section 
od, 

The  only  known  bounded  fifo  solution  to  the 
problem,  due  to  (FLBB79)  was  based  on  the  use 
of  a  strong  form  of  Test  and  Set.  It  was  un¬ 
known  whether  a  level  of  fairness  higher  than  n3- 
waiUng  (see  [DGS88])  without  use  of  test  and  set 


can  be  achieved.  It  is  interesting  to  note  that 
the  amount  of  shared  memory  needed  meets  the 
lower  bound  of  (FLBB79J.  If  one  is  interested  in 
the  unbounded  implementation,  just  substitute 
f,  :=  mar(fi,  ..,fn)+  1  for  the  labeling  operation, 
and  rtad(lt,  for  the  scan.  Notice  that  for 

1=1,  the  above  is  a  very  simple  solution  to  the 
fundamental  mutual  exclusion  problem  of  [D65]. 
Other  algorithms  such  as  the  unbounded  imple¬ 
mentation  of  a  ctss  in  the  Bakery  Algorithm  of 
Lamport  [L74],  can  also  be  modular ly  replaced, 
and  by  adding  a  simple  modification  to  allow  the 
ctss  to  include  restarts,  the  solution  can  be  made 
to  be  resiliant  to  restart  failures  [L74,  L86d]. 
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